NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

FLASH: Fast Model Adaptation in ML-Centric Cloud Platforms.

Qiu, H; Mao, W; Patke, A; Cui, S; Wang, C; Franke, H; Kalbarczyk, Z; Başar, T; Iyer, R (September 2024, MLSys)
Gibbons, PhillipB; Pekhimenko, Gennady; De_Sa, Christopher (Ed.)
The emergence of ML in various cloud system management tasks (e.g., workload autoscaling and job scheduling) has become a core driver of ML-centric cloud platforms. However, there are still numerous algorithmic and systems challenges that prevent ML-centric cloud platforms from being production-ready. In this paper, we focus on the challenges of model performance variability and costly model retraining, introduced by dynamic workload patterns and heterogeneous applications and infrastructures in cloud environments. To address these challenges, we present FLASH, an extensible framework for fast model adaptation in ML-based system management tasks. We show how FLASH leverages existing ML agents and their training data to learn to generalize across applications/environments with meta-learning. FLASH can be easily integrated with an existing ML-based system management agent with a unified API. We demonstrate the use of FLASH by implementing three existing ML agents that manage (1) resource configurations, (2) autoscaling, and (3) server power. Our experiments show that FLASH enables fast adaptation to new, previously unseen applications/environments (e.g., 5.5× faster than transfer learning in the autoscaling task), indicating significant potential for adopting ML-centric cloud platforms in production.
more » « less
Full Text Available
FLASH: Fast Model Adaptation in ML-Centric Cloud Platforms.

Qiu, H; Mao, W; Patke, A; Cui, S; Wang, C; Franke, H; Kalbarczyk, Z; Başar, T; Iyer, R (September 2024, MLSys)
Gibbons, Phillip B; Gennady, P; De_Sa, Christopher (Ed.)
The emergence of ML in various cloud system management tasks (e.g., workload autoscaling and job scheduling) has become a core driver of ML-centric cloud platforms. However, there are still numerous algorithmic and systems challenges that prevent ML-centric cloud platforms from being production-ready. In this paper, we focus on the challenges of model performance variability and costly model retraining, introduced by dynamic workload patterns and heterogeneous applications and infrastructures in cloud environments. To address these challenges, we present FLASH, an extensible framework for fast model adaptation in ML-based system management tasks. We show how FLASH leverages existing ML agents and their training data to learn to generalize across applications/environments with meta-learning. FLASH can be easily integrated with an existing ML-based system management agent with a unified API. We demonstrate the use of FLASH by implementing three existing ML agents that manage (1) resource configurations, (2) autoscaling, and (3) server power. Our experiments show that FLASH enables fast adaptation to new, previously unseen applications/environments (e.g., 5.5× faster than transfer learning in the autoscaling task), indicating significant potential for adopting ML-centric cloud platforms in production.
more » « less
Full Text Available
Power-aware Deep Learning Model Serving with µ-Serve. In Proceedings of the 2024 USENIX Annual Technical Conference (ATC 2024).

Qiu, H; Mao, W; Patke, A; Cui, S; Jha, S; Wang, C; Franke, H; Kalbarczyk, Z; Basar, T; Iyer, R (September 2024, Usenix_Atc_24)
Begnum, Kyrre; Border, Charles (Ed.)
With the increasing popularity of large deep learning model serving workloads, there is a pressing need to reduce the energy consumption of a model-serving cluster while maintaining satisfied throughput or model-serving latency requirements. Model multiplexing approaches such as model parallelism, model placement, replication, and batching aim to optimize the model-serving performance. However, they fall short of leveraging the GPU frequency scaling opportunity for power saving. In this paper, we demonstrate (1) the benefits of GPU frequency scaling in power saving for model serving; and (2) the necessity for co-design and optimization of fine grained model multiplexing and GPU frequency scaling. We explore the co-design space and present a novel power-aware model-serving system, μ-Serve. μ-Serve is a model-serving framework that optimizes the power consumption and model serving latency/throughput of serving multiple ML models efficiently in a homogeneous GPU cluster. Evaluation results on production workloads show that μ-Serve achieves 1.2–2.6× power saving by dynamic GPU frequency scaling (up to 61% reduction) without SLO attainment violations.
more » « less
Full Text Available
Power-aware Deep Learning Model Serving with µ-Serve

Qiu, H; Mao, W; Patke, A; Cui, S; Jha, S; Wang, C; Franke, H; Kalbarczyk, Z; Basar, T; Iyer, R (September 2024, Usenix_Atc_24)
Begnum, Kyrre; Border, Charles (Ed.)
With the increasing popularity of large deep learning model-serving workloads, there is a pressing need to reduce the energy consumption of a model-serving cluster while maintaining satisfied throughput or model-serving latency requirements. Model multiplexing approaches such as model parallelism, model placement, replication, and batching aim to optimize the model-serving performance. However, they fall short of leveraging the GPU frequency scaling opportunity for power saving. In this paper, we demonstrate (1) the benefits of GPU frequency scaling in power saving for model serving; and (2) the necessity for co-design and optimization of fine-grained model multiplexing and GPU frequency scaling. We explore the co-design space and present a novel power-aware model-serving system, μ-Serve. μ-Serve is a model-serving framework that optimizes the power consumption and model-serving latency/throughput of serving multiple ML models efficiently in a homogeneous GPU cluster. Evaluation results on production workloads show that μ-Serve achieves 1.2–2.6× power saving by dynamic GPU frequency scaling (up to 61% reduction) without SLO attainment violations.
more » « less
Full Text Available
Multi-Agent Meta-Reinforcement Learning: Sharper Convergence Rates with Task Similarity

Mao, W; Qiu, H; Wang, C; Franke, H; Kalbarczyk, Z; Iyer, R; Basar, T (April 2024, NeurIPS)
Oh, A; Naumann, T; Globerson, A; Saenko, K; Hardt, M; Levine, S (Ed.)
Multi-agent reinforcement learning (MARL) has primarily focused on solving a single task in isolation, while in practice the environment is often evolving, leaving many related tasks to be solved. In this paper, we investigate the benefits of meta-learning in solving multiple MARL tasks collectively. We establish the first line of theoretical results for meta-learning in a wide range of fundamental MARL settings, including learning Nash equilibria in two-player zero-sum Markov games and Markov potential games, as well as learning coarse correlated equilibria in general-sum Markov games. Under natural notions of task similarity, we show that meta-learning achieves provable sharper convergence to various game-theoretical solution concepts than learning each task separately. As an important intermediate step, we develop multiple MARL algorithms with initialization-dependent convergence guarantees. Such algorithms integrate optimistic policy mirror descents with stage-based value updates, and their refined convergence guarantees (nearly) recover the best known results even when a good initialization is unknown. To our best knowledge, such results are also new and might be of independent interest. We further provide numerical simulations to corroborate our theoretical findings.
more » « less
Full Text Available
When Green Computing Meets Performance and Resilience SLOs.

https://doi.org/10.1109/DSN-S60304.2024

Qiu, H; Mao, W; Wang, C; Jha, S; Franke, H; Narayanaswami, C; Kalbarczyk, ZT; Basar, T; Iyer, R_K (January 2024, Institute of Electrical and Electronics Engineers)
nd (Ed.)
This paper addresses the urgent need to transition to global net-zero carbon emissions by 2050 while retaining the ability to meet joint performance and resilience objectives. The focus is on the computing infrastructures, such as hyperscale cloud datacenters, that consume significant power, thus producing increasing amounts of carbon emissions. Our goal is to (1) optimize the usage of green energy sources (e.g., solar energy), which is desirable but expensive and relatively unstable, and (2) continuously reduce the use of fossil fuels, which have a lower cost but a significant negative societal impact. Meanwhile, cloud datacenters strive to meet their customers’ requirements, e.g., service-level objectives (SLOs) in application latency or throughput, which are impacted by infrastructure resilience and availability. We propose a scalable formulation that combines sustainability, cloud resilience, and performance as a joint optimization problem with multiple interdependent objectives to address these issues holistically. Given the complexity and dynamicity of the problem, machine learning (ML) approaches, such as reinforcement learning, are essential for achieving continuous optimization. Our study highlights the challenges of green energy instability which necessitates innovative MLcentric solutions across heterogeneous infrastructures to manage the transition towards green computing. Underlying the MLcentric solutions must be methods to combine classic system resilience techniques with innovations in real-time ML resilience (not addressed heretofore). We believe that this approach will not only set a new direction in the resilient, SLO-driven adoption of green energy but also enable us to manage future sustainable systems in ways that were not possible before.
more » « less
Full Text Available
Dynamic compression of water to conditions in ice giant interiors

https://doi.org/10.1038/s41598-021-04687-6

Gleason, A. E.; Rittman, D. R.; Bolme, C. A.; Galtier, E.; Lee, H. J.; Granados, E.; Ali, S.; Lazicki, A.; Swift, D.; Celliers, P.; et al (December 2022, Scientific Reports)

Abstract Recent discoveries of water-rich Neptune-like exoplanets require a more detailed understanding of the phase diagram of H 2 O at pressure–temperature conditions relevant to their planetary interiors. The unusual non-dipolar magnetic fields of ice giant planets, produced by convecting liquid ionic water, are influenced by exotic high-pressure states of H 2 O—yet the structure of ice in this state is challenging to determine experimentally. Here we present X-ray diffraction evidence of a body-centered cubic (BCC) structured H 2 O ice at 200 GPa and ~ 5000 K, deemed ice XIX, using the X-ray Free Electron Laser of the Linac Coherent Light Source to probe the structure of the oxygen sub-lattice during dynamic compression. Although several cubic or orthorhombic structures have been predicted to be the stable structure at these conditions, we show this BCC ice phase is stable to multi-Mbar pressures and temperatures near the melt boundary. This suggests variable and increased electrical conductivity to greater depths in ice giant planets that may promote the generation of multipolar magnetic fields.
more » « less
Full Text Available
Atomistic deformation mechanism of silicon under laser-driven shock compression

https://doi.org/10.1038/s41467-022-33220-0

Pandolfi, Silvia; Brown, S. Brennan; Stubley, P. G.; Higginbotham, Andrew; Bolme, C. A.; Lee, H. J.; Nagler, B.; Galtier, E.; Sandberg, R. L.; Yang, W.; et al (September 2022, Nature Communications)

Abstract Silicon (Si) is one of the most abundant elements on Earth, and it is the most widely used semiconductor. Despite extensive study, some properties of Si, such as its behaviour under dynamic compression, remain elusive. A detailed understanding of Si deformation is crucial for various fields, ranging from planetary science to materials design. Simulations suggest that in Si the shear stress generated during shock compression is released via a high-pressure phase transition, challenging the classical picture of relaxation via defect-mediated plasticity. However, direct evidence supporting either deformation mechanism remains elusive. Here, we use sub-picosecond, highly-monochromatic x-ray diffraction to study (100)-oriented single-crystal Si under laser-driven shock compression. We provide the first unambiguous, time-resolved picture of Si deformation at ultra-high strain rates, demonstrating the predicted shear release via phase transition. Our results resolve the longstanding controversy on silicon deformation and provide direct proof of strain rate-dependent deformation mechanisms in a non-metallic system.
more » « less

Search for: All records